Goto

Collaborating Authors

 Marion County


M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding

Cho, Jaemin, Mahata, Debanjan, Irsoy, Ozan, He, Yujie, Bansal, Mohit

arXiv.org Artificial Intelligence

Document visual question answering (DocVQA) pipelines that answer questions from documents have broad applications. Existing methods focus on handling single-page documents with multi-modal language models (MLMs), or rely on text-based retrieval-augmented generation (RAG) that uses text extraction tools such as optical character recognition (OCR). However, there are difficulties in applying these methods in real-world scenarios: (a) questions often require information across different pages or documents, where MLMs cannot handle many long documents; (b) documents often have important information in visual elements such as figures, but text extraction tools ignore them. We introduce M3DocRAG, a novel multi-modal RAG framework that flexibly accommodates various document contexts (closed-domain and open-domain), question hops (single-hop and multi-hop), and evidence modalities (text, chart, figure, etc.). M3DocRAG finds relevant documents and answers questions using a multi-modal retriever and an MLM, so that it can efficiently handle single or many documents while preserving visual information. Since previous DocVQA datasets ask questions in the context of a specific document, we also present M3DocVQA, a new benchmark for evaluating open-domain DocVQA over 3,000+ PDF documents with 40,000+ pages. In three benchmarks (M3DocVQA/MMLongBench-Doc/MP-DocVQA), empirical results show that M3DocRAG with ColPali and Qwen2-VL 7B achieves superior performance than many strong baselines, including state-of-the-art performance in MP-DocVQA. We provide comprehensive analyses of different indexing, MLMs, and retrieval models. Lastly, we qualitatively show that M3DocRAG can successfully handle various scenarios, such as when relevant information exists across multiple pages and when answer evidence only exists in images.


Optimizing Luxury Vehicle Dealership Networks: A Graph Neural Network Approach to Site Selection

Carocci, Luca Silvano, Han, Qiwei

arXiv.org Artificial Intelligence

This study presents a novel application of Graph Neural Networks (GNNs) to optimize dealership network planning for a luxury car manufacturer in the U.S. By conducting a comprehensive literature review on dealership location determinants, the study identifies 65 county-level explanatory variables, augmented by two additional measures of regional interconnectedness derived from social and mobility data. An ablation study involving 34 variable combinations and ten state-of-the-art GNN operators reveals key insights into the predictive power of various variables, particularly highlighting the significance of competition, demographic factors, and mobility patterns in influencing dealership location decisions. The analysis pinpoints seven specific counties as promising targets for network expansion. This research not only illustrates the effectiveness of GNNs in solving complex geospatial decision-making problems but also provides actionable recommendations and valuable methodological insights for industry practitioners.


MiMiC: Minimally Modified Counterfactuals in the Representation Space

Singh, Shashwat, Ravfogel, Shauli, Herzig, Jonathan, Aharoni, Roee, Cotterell, Ryan, Kumaraguru, Ponnurangam

arXiv.org Artificial Intelligence

Language models often exhibit undesirable behaviors, such as gender bias or toxic language. Interventions in the representation space were shown effective in mitigating such issues by altering the LM behavior. We first show that two prominent intervention techniques, Linear Erasure and Steering Vectors, do not enable a high degree of control and are limited in expressivity. We then propose a novel intervention methodology for generating expressive counterfactuals in the representation space, aiming to make representations of a source class (e.g., "toxic") resemble those of a target class (e.g., "non-toxic"). This approach, generalizing previous linear intervention techniques, utilizes a closed-form solution for the Earth Mover's problem under Gaussian assumptions and provides theoretical guarantees on the representation space's geometric organization. We further build on this technique and derive a nonlinear intervention that enables controlled generation. We demonstrate the effectiveness of the proposed approaches in mitigating bias in multiclass classification and in reducing the generation of toxic language, outperforming strong baselines.


Cops use World of Warcraft account to find Florida man hiding missing girl

FOX News

Fox News correspondent CB Cotton reports on how predators are allegedly targeting children on social media on'The Faulkner Focus.' A 31-year-old Florida man was arrested and faces charges after police say he hid a missing Ohio teen and planned to have sex with her. Detective Henrick Osthed arrested Thomas Ebersole on Wednesday, Jan. 3, 2024, the Marion County Sheriff's Office said in a news release. An FBI special agent had reached out to Det. Investigators found the girl after she logged into World of Warcraft, an online video game, from Ebersole's home address in Dunnellon, police said.


Does Outrage Signal Cyber Attacks? Predicting "Bad Behavior" from Sentiment in Online Content

Hollingshead, Kristy (Florida Institute for Human and Machine Cognition) | Dorr, Bonnie J. (Florida Institute for Human and Machine Cognition) | Dalton, Adam (Florida Institute for Human and Machine Cognition) | Barton, Meg (Leidos, Inc.)

AAAI Conferences

We demonstrate that it is possible to leverage big data in the form of tweets and linked webpages to find expressions of sentiment that signal "bad behavior" such as cyber attacks. We hypothesize that expressions of "outrage" (high intensity, negative affect sentiment) against an organization in public data may be predictive of cyber attacks for two reasons: 1) threat actors may be motivated to launch an attack based on anger/discontent, and 2) outrage associated with an organization or industry may increase the likelihood of that organization or industry being victimized by threat actors (i.e., as a form of "vigilante justice"). We measure sentiment in online content and determine trends in public emotion and their correlation to trends in cyber attacks, as reported in Hackmageddon. We demonstrate that dimensions of sentiment, as afforded by our use of the Circumplex model of emotion, do yield correlations to reported cyber attacks, but differ dependent upon the domain of the data. Thus the use of this technique requires careful analysis for optimal application.


Cyberdyne's HAL Exoskeleton Helps Patients Walk Again in First Treatments at U.S. Facility

IEEE Spectrum Robotics

Danny Bal was riding his brand new motorcycle to work from his home in Ocala, Florida two years ago when the driver of an oncoming car fell asleep and ploughed into Bal's electric-blue bike. After the accident, which crushed three of Bal's thoracic vertebrae and shredded a spinal nerve, Bal adjusted to life in a wheelchair. He added a motorized lift to his beloved F-250 truck, explored local trails with a hand-powered bike, and joined a therapeutic horseback riding program. Now, one of Bal's daughters is about to get married, and 57-year-old Bal wants to walk in her ceremony. So on a recent Friday morning in December at Brooks Rehabilitation in Jacksonville, Florida, Bal was back on his feet, taking slow but steady steps as his granddaughter cheered from the sidelines.


An Ostrich-Like Robot Pushes the Limits of Legged Locomotion

MIT Technology Review

What looks like a tiny mechanical ostrich chasing after a car is actually a significant leap forward for robot-kind. The clever and simple two-legged robot, known as the Planar Elliptical Runner, was developed at the Institute for Human and Machine Cognition in Ocala, Florida, to explore how mechanical design can be used to enable sophisticated legged locomotion. A video produced by the researchers shows the robot being tested in a number of situations, including on a treadmill and running behind and alongside a car with a helping hand from an engineer. In contrast to many other legged robots, this one doesn't use sensors and a computer to help balance itself. Instead, its mechanical design provides dynamic stability as it runs.


Government regulators are looking into fatal Tesla crash involving Autopilot

#artificialintelligence

Tesla announced today that the National Highway Traffic Safety Administration has opened an investigation into a recent fatal crash of a Model S with the company's Autopilot feature activated. The accident took place on May 7th in a small West Florida town called Williston. The Florida Highway Patrol is also conducting its own investigation of the accident, according to a public affairs officer there. The same officer reported that Tesla has, since the fatal accident in May, sent engineers down to Ocala, Florida to assist investigators in accessing data they needed to evaluate the causes of the crash. Tesla offered an account of the event in a blog post titled "A Tragic Loss" that went up today, detailing the crash, an "extremely rare circumstance," which occurred on a divided highway.


Speech Adaptation in Extended Ambient Intelligence Environments

Dorr, Bonnie J. (Institute for Human and Machine Cognition) | Galescu, Lucian (Institute for Human and Machine Cognition) | Perera, Ian (Institute for Human and Machine Cognition) | Hollingshead-Seitz, Kristy (Institute for Human and Machine Cognition) | Atkinson, David (Institute for Human and Machine Cognition) | Clark, Micah (Institute for Human and Machine Cognition) | Clancey, William (Institute for Human and Machine Cognition) | Wilks, Yorick ( Institute for Human and Machine Cognition ) | Fosler-Lussier, Eric (Ohio State University)

AAAI Conferences

This Blue Sky presentation focuses on a major shift toward a notion of “ambient intelligence” that transcends general applications targeted at the general population.  The focus is on highly personalized agents that accommodate individual differences and changes over time.  This notion of Extended Ambient Intelligence (EAI) concerns adaptation to a person’s preferences and experiences, as well as changing capabilities, most notably in an environment where conversational engagement is central.  An important step in moving this research forward is the accommodation of different degrees of cognitive capability (including speech processing) that may vary over time for a given user—whether through improvement or through deterioration. We suggest that the application of divergence detection to speech patterns may enable adaptation to a speaker’s increasing or decreasing level of speech impairment over time. Taking an adaptive approach toward technology development in this arena may be a first step toward empowering those with special needs so that they may live with a high quality of life.  It also represents an important step toward a notion of ambient intelligence that is personalized beyond what can be achieved by mass-produced, one-size-fits-all software currently in use on mobile devices.


Deterioration of Speech as an Indicator of Physiological Degeneration (DESIPHER)

Dorr, Bonnie J. (Florida Institute for Human and Machine Cognition) | Perera, Ian (Institute for Human and Machine Cognition) | Phillips, Samuel (JAH Veterans’ Hospital) | Jasiewicz, Jan (JAH Veterans’ Hospital)

AAAI Conferences

Our speech research focuses on the detection of dialectal Most physiological assessments commonly used to determine variations by identifying speech language divergences the functional status of patients with Amyotrophic along a range of different dimensions. We borrow the notion lateral sclerosis (ALS) require trained clinical personnel to of divergence from the study of cross-linguistic variations administer and interpret the results. Speech impairments (Dorr, 1993) and apply it towards developing an assessment eventually affect 80-95% of patients with ALS (Beukelman, of bulbar function in patients with ALS, to improve 2011). Initial impairments include reduced speaking upon existing assessments (Green et al., 2013).